An LSH Index for Computing Kendall's Tau over Top-k Lists
نویسندگان
چکیده
We consider the problem of similarity search within a set of top-k lists under the Kendall’s Tau distance function. This distance describes how related two rankings are in terms of concordantly and discordantly ordered items. As top-k lists are usually very short compared to the global domain of possible items to be ranked, creating an inverted index to look up overlapping lists is possible but does not capture tight enough the similarity measure. In this work, we investigate locality sensitive hashing schemes for the Kendall’s Tau distance and evaluate the proposed methods using two real-world datasets.
منابع مشابه
RankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce
We consider the problem of processing K-Nearest Neighbor (KNN) queries over large datasets where the index is jointly maintained by a set of machines in a computing cluster. The proposed RankReduce approach uses locality sensitive hashing (LSH) together with a MapReduce implementation, which by design is a perfect match as the hashing principle of LSH can be smoothly integrated in the mapping p...
متن کاملA New Weighted Rank Correlation
Problem Statement: There have been many cases in real life where two independent sources have ranked n objects, with the interest focused on agreement in the top rankings. Spearman's rho and Kendall's tau coefficients assigned equal weights to all rankings. As a result, the literature proposed several weighted correlation coefficients with emphasis on the top rankings, including the top-down, w...
متن کاملEstimation of Kendall's tau from censored data
This paper considers the nonparametric estimation of Kendall's tau for bivariate censored data. Under censoring, there have been some papers discussing the nonparametric estimation of Kendall's tau, such as Wang and Wells (2000), Oakes (2008) and Lakhal, Rivest and Beaudoin (2009). In this article, we consider an alternative approach to estimate Kendall's tau. The main idea is to replace a cens...
متن کاملA Clustered Index Approach to Distributed XPath
Supporting top-k queries over distributed collections of schemaless XML data poses two challenges. While XML supports expressive query languages such as XPath and XQuery, these languages require schema knowledge so as to write an appropriate query which may not be available in distributed systems with autonomous and dynamic sources. Thus, there is a need for approximate query processing. Furthe...
متن کاملAbstract structure of partial function $*$-algebras over semi-direct product of locally compact groups
This article presents a unified approach to the abstract notions of partial convolution and involution in $L^p$-function spaces over semi-direct product of locally compact groups. Let $H$ and $K$ be locally compact groups and $tau:Hto Aut(K)$ be a continuous homomorphism. Let $G_tau=Hltimes_tau K$ be the semi-direct product of $H$ and $K$ with respect to $tau$. We define left and right $tau$-c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1409.0651 شماره
صفحات -
تاریخ انتشار 2014